Integrate Automated QDQ placement tool - part 3.3 by willg-nv · Pull Request #839 · NVIDIA/Model-Optimizer

willg-nv · 2026-02-02T03:04:02Z

What does this PR do?

This PR implements QDQ autotuner CLI. This is the initial version of CLI, it will be integrated to modelopt.onnx.quantization.autotune.
Usage:

  python -m modelopt.onnx.quantization.autotune
      --onnx_path model.onnx --schemes_per_region 50
      --pattern_cache cache.yaml --qdq_baseline baseline.onnx
      --quant_type int8 --verbose

PR 3.1: #837
PR 3.2 #838
PR 3.3: #839

Overview: ?

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: No
Did you add or update any necessary documentation?: Document will be added in part 4.
Did you update Changelog?: CHANGE log will be added in part 4.

Additional Information

Summary by CodeRabbit

Release Notes

New Features
- Added a command-line interface for ONNX quantization autotuning with configurable parameters for models, output paths, quantization strategies, and TensorRT benchmarking.
- Introduced an automated workflow for pattern-based region optimization with state management, baseline comparison, and benchmarking capabilities.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

copy-pr-bot · 2026-02-02T03:04:06Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-02-02T03:05:09Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

These changes introduce a command-line interface and core workflow orchestration for ONNX Q/DQ autotuning. The CLI entry point parses configuration arguments, validates inputs, initializes TensorRT benchmarking, and invokes a region-pattern autotuning workflow that profiles models, applies quantization schemes, benchmarks performance, and exports optimized variants.

Changes

Cohort / File(s)	Summary
CLI Entry Point `modelopt/onnx/quantization/autotune/__main__.py`	Implements `run_autotune()` function with argument parsing via `_get_autotune_parser()`, input validation, TensorRT benchmark initialization, and orchestration of the region-pattern autotuning workflow. Includes error handling for keyboard interruption and general exceptions, plus logging of benchmark configuration.
Workflow Core `modelopt/onnx/quantization/autotune/workflows.py`	Provides `benchmark_onnx_model()` for latency measurement, `init_benchmark_instance()` for TensorRT benchmark setup, and `region_pattern_autotuning_workflow()` for automated Q/DQ optimization via region discovery, pattern filtering, per-region scheme iteration, model export, and state checkpointing.

Sequence Diagram

sequenceDiagram
    actor User
    participant CLI as CLI (run_autotune)
    participant Validator as Input Validator
    participant Benchmark as Benchmark Init
    participant Workflow as Autotuning Workflow
    participant Model as ONNX Model
    participant TensorRT as TensorRT Engine
    participant Output as Model Export

    User->>CLI: Invoke with arguments
    CLI->>Validator: Validate model & baseline paths
    Validator-->>CLI: Path valid / exit
    CLI->>Benchmark: Initialize benchmark instance
    Benchmark->>TensorRT: Configure with timing cache & plugins
    TensorRT-->>Benchmark: Instance ready
    Benchmark-->>CLI: Benchmark initialized
    CLI->>Workflow: Invoke region_pattern_autotuning_workflow
    Workflow->>Model: Load ONNX model
    Workflow->>Model: Load pattern cache & QDQ baseline
    Workflow->>Workflow: Profile regions & apply node filters
    loop For each region
        Workflow->>Workflow: Generate quantization schemes
        Workflow->>Model: Apply Q/DQ to region
        Workflow->>TensorRT: Benchmark model
        TensorRT-->>Workflow: Latency result
    end
    Workflow->>Output: Export optimized model
    Output-->>Workflow: Export complete
    Workflow->>Output: Save state checkpoint
    Output-->>Workflow: State saved
    Workflow-->>CLI: Return autotuner result
    CLI-->>User: Exit with status

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'Integrate Automated QDQ placement tool - part 3.3' clearly describes the main change: adding a QDQ autotuner CLI. It directly relates to the changeset which introduces new modules for ONNX Q/DQ autotuning workflow and a command-line entry point.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@modelopt/onnx/quantization/autotune/__main__.py`:
- Around line 107-116: init_benchmark_instance can return None on failure but
the current flow continues; update the caller (the block after
log_benchmark_config) to check the return value of init_benchmark_instance (when
called with use_trtexec=args.use_trtexec,
plugin_libraries=args.plugin_libraries, timing_cache_file=args.timing_cache,
warmup_runs=args.warmup_runs, timing_runs=args.timing_runs,
trtexec_args=trtexec_args) and if it returns None, log an error and exit early
(e.g., sys.exit(1)) so the script fails fast instead of producing misleading
infinite benchmark results.

In `@modelopt/onnx/quantization/autotune/workflows.py`:
- Around line 239-246: The Config instantiation currently hardcodes verbose=True
which forces noisy logging; change the call that constructs Config (the
Config(...) in this file) to accept a verbose parameter (e.g., verbose=verbose
or verbose=args_verbose) and thread that boolean from the CLI invocation that
creates/starts the autotuner (update the CLI call site to pass args.verbose into
the function that triggers this code), ensuring logger.info stays unchanged but
Config uses the provided verbose flag instead of True.

modelopt/onnx/quantization/autotune/__main__.py

modelopt/onnx/quantization/autotune/workflows.py

tests/unit/onnx/quantization/autotune/test_autotune_config.py

gcunhase · 2026-02-10T00:30:15Z

Please add a test for workflows. Example: https://github.com/gcunhase/TensorRT-Model-Optimizer/blob/85228103a29662c721d862cb1cec38b0193699f5/tests/unit/onnx/quantization/autotune/test_workflows.py#L36

modelopt/onnx/quantization/autotune/__main__.py

tests/unit/onnx/quantization/autotune/test_config.py

willg-nv · 2026-02-12T00:51:09Z

Please add a test for workflows. Example: https://github.com/gcunhase/TensorRT-Model-Optimizer/blob/85228103a29662c721d862cb1cec38b0193699f5/tests/unit/onnx/quantization/autotune/test_workflows.py#L36

Added, please check

ajrasane · 2026-02-12T09:34:11Z

/ok to test 0414b81

modelopt/onnx/quantization/autotune/common.py

tests/gpu/onnx/quantization/autotune/test_workflow.py

gcunhase · 2026-02-20T16:38:37Z

@willg-nv, I'm seeing the following errors in the pre-commit step:

modelopt/onnx/quantization/autotune/common.py:352: error: Name "RegionPattern" is not defined  [name-defined]
modelopt/onnx/quantization/autotune/common.py:406: error: Name "RegionPattern" is not defined  [name-defined]
modelopt/onnx/quantization/autotune/common.py:630: error: List item 0 has incompatible type "NodeInputInsertionPoint"; expected "ChildRegionInputInsertionPoint"  [list-item]
modelopt/onnx/quantization/autotune/common.py:635: error: Name "temp_insertion_points" already defined on line 619  [no-redef]
modelopt/onnx/quantization/autotune/common.py:640: error: Argument 1 to "append" of "list" has incompatible type "NodeInputInsertionPoint"; expected "ChildRegionInputInsertionPoint"  [arg-type]
modelopt/onnx/quantization/autotune/common.py:646: error: List item 0 has incompatible type "NodeInputInsertionPoint"; expected "ChildRegionOutputInsertionPoint"  [list-item]
modelopt/onnx/quantization/autotune/common.py:650: error: Name "temp_insertion_points" already defined on line 619  [no-redef]
modelopt/onnx/quantization/autotune/common.py:655: error: Argument 1 to "append" of "list" has incompatible type "NodeInputInsertionPoint"; expected "ChildRegionOutputInsertionPoint"  [arg-type]
Found 8 errors in 1 file (checked 1 source file)

gcunhase · 2026-02-24T01:44:08Z

@willg-nv the precommit fixes are working, thank you!

One last thing is the test_workflows.py might need to move to tests/gpu/onnx for TRT access. Thanks.

willg-nv · 2026-02-24T05:02:35Z

@willg-nv the precommit fixes are working, thank you!

One last thing is the test_workflows.py might need to move to tests/gpu/onnx for TRT access. Thanks.

done

Copilot

Pull request overview

This PR implements the command-line interface (CLI) for the ONNX Q/DQ autotuning framework, completing part 3.3 of the automated QDQ placement tool integration. The PR builds upon the benchmark module (PR #837) and QDQAutotuner class (PR #838), providing a complete end-to-end workflow for automated quantization optimization of ONNX models using pattern-based region analysis and TensorRT performance measurement.

Changes:

Added CLI (__main__.py) with comprehensive argument parsing for model paths, quantization parameters, TensorRT benchmarking configuration, and workflow control
Implemented high-level workflow orchestration (workflows.py) managing pattern-based region optimization, state persistence, baseline comparison, and benchmarking
Extended common data structures with PatternSchemes, PatternCache, and Config classes for managing quantization schemes, caching patterns, and configuration

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`modelopt/onnx/quantization/autotune/__main__.py`	CLI implementation with argument parsing, input validation, and workflow invocation
`modelopt/onnx/quantization/autotune/workflows.py`	Workflow functions for benchmark initialization, pattern-based autotuning, and region filtering
`modelopt/onnx/quantization/autotune/common.py`	Extended with PatternSchemes, PatternCache, and Config dataclasses for scheme management and serialization
`tests/unit/onnx/quantization/autotune/test_config.py`	Unit tests for Config class default values, custom values, and parameter validation
`tests/gpu/onnx/quantization/autotune/test_workflow.py`	GPU test for quantized model export with Q/DQ insertion
`tests/_test_utils/onnx/quantization/autotune/models.py`	Test helper for creating simple ONNX models for autotuner testing

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/gpu/onnx/quantization/autotune/test_workflow.py

modelopt/onnx/quantization/autotune/common.py

## What does this PR do? This PR integrates benchmark module to QDQ autotunner. This benchamrk module is used to evaluate ONNX model perf. This PR is 1/3 of #703. Once all small PRs are merged #703 could be closed. PR 3.1: #837 PR 3.2 #838 PR 3.3: #839 ## Testing  ## Before your PR is "*Ready for review*"  - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: No - **Did you add or update any necessary documentation?**: No, document will be added in part 4. - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: No, change log will be updated when all changes are merged. ## Additional Information   ## Summary by CodeRabbit * **New Features** * Added ONNX quantization autotuning capabilities with a consolidated module providing streamlined import paths for core components. * Introduced unified benchmarking framework supporting TensorRT-based model evaluation with both command-line and Python API implementations. * Added support for timing cache persistence, custom plugin libraries, shape validation, and dynamic input shape configuration for flexible model testing and optimization. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub>  --------- Signed-off-by: Will Guo <[email protected]>

modelopt-bot · 2026-02-26T07:09:46Z

Review: PR #839 - Integrate Automated QDQ placement tool - part 3.3

Overall Assessment

This PR implements the CLI and high-level workflows for QDQ autotuning, completing the autotuner package. The code is well-structured with comprehensive CLI arguments, good workflow orchestration, and thorough tests. Several minor issues need addressing before merge.

✅ What's Good

Comprehensive CLI - Well-designed argument parser with clear grouping (Model/Output, Strategy, Quantization, TensorRT Benchmark)
Good workflow abstraction - region_pattern_autotuning_workflow() provides a clean high-level API
Crash recovery - State saving after each region enables resume capability
Pattern cache warm-start - Smart integration with existing patterns from baseline models
Good test coverage - Both unit tests (Config) and GPU integration tests (workflow)

🚨 Critical Issues

None identified — code is production-ready after addressing medium priority items.

⚠️ High Priority Issues

1. Incorrect use of `strip()` method

File: tests/gpu/onnx/quantization/autotune/test_workflow.py (line ~48)

output_dir = baseline_model_path.replace(".onnx", "")  # Current
# The strip() method is being used incorrectly in some contexts
# ..."test.onnx".strip(".onnx") returns "test" but that's incidental

The strip() method removes individual characters, not substrings. Use removesuffix(".onnx") or replace(".onnx", "", 1) instead:

output_dir = baseline_model_path.removesuffix(".onnx")

🔧 Medium Priority Issues

2. List modification during iteration

File: modelopt/onnx/quantization/autotune/common.py (within add_pattern_schemes)

Modifying filtered_schemes while iterating (via remove()) can be risky. While the code breaks after removal (making it safe), consider using a clearer pattern:

# Current: modifies list while iterating
filtered_schemes.remove(existing_to_remove)

# Safer: build new list
filtered_schemes = [s for s in filtered_schemes if s is not existing_to_remove]

3. Missing docstring documentation

File: modelopt/onnx/quantization/autotune/common.py (Config class)

The default_dq_dtype field is defined but not documented in the class docstring
Docstring indentation is inconsistent (some sections not properly indented)

4. String truncation logic

File: modelopt/onnx/quantization/autotune/common.py (PatternSchemes.__str__)

pattern_str = self.pattern_signature[:40] + ("..." if len(self.pattern_signature) > 40 else "")

The current code may append ... even when not truncated. Fix:

pattern_str = self.pattern_signature[:40]
if len(self.pattern_signature) > 40:
    pattern_str += "..."

💡 Simplification Suggestions

Resource cleanup in tests

File: tests/gpu/onnx/quantization/autotune/test_workflow.py

The finally block only cleans up output_path but misses:

baseline_model_path (temporary file)
output_dir directory

Consider using tempfile.TemporaryDirectory() context manager for cleaner cleanup.

📊 Code Changes Summary

File	Changes	Purpose
`__main__.py`	+302	CLI entry point with argparse
`workflows.py`	+376	High-level autotuning workflow
`common.py`	+551	PatternSchemes, PatternCache, Config
`test_workflow.py`	+82	GPU integration tests
`test_config.py`	+97	Unit tests for Config
`models.py`	+4	Update test model dimensions

✅ Recommendations

Before merge:

✅ Fix strip() usage → use removesuffix(".onnx")
✅ Add missing default_dq_dtype to Config docstring
✅ Fix docstring indentation in Config class

Nice to have:
4. Consider tempfile.TemporaryDirectory() in tests
5. Consider clarifying the list modification logic in add_pattern_schemes

Overall: This PR is well-implemented and necessary for the QDQ autotuning CLI. The high-level workflow abstraction is clean, and the CLI design is user-friendly with good examples and help text.

cc: @willg-nv @gcunhase

cjluo-nv · 2026-02-26T07:13:21Z

Review: PR #839 - Integrate Automated QDQ placement tool - part 3.3

Overall Assessment

This PR implements the CLI and high-level workflows for QDQ autotuning, completing the autotuner package. The code is well-structured with comprehensive CLI arguments, good workflow orchestration, and thorough tests. Several minor issues need addressing before merge.

✅ What's Good

Comprehensive CLI - Well-designed argument parser with clear grouping (Model/Output, Strategy, Quantization, TensorRT Benchmark)

Good workflow abstraction - region_pattern_autotuning_workflow() provides a clean high-level API

Crash recovery - State saving after each region enables resume capability

Pattern cache warm-start - Smart integration with existing patterns from baseline models

Good test coverage - Both unit tests (Config) and GPU integration tests (workflow)

🚨 Critical Issues

None identified — code is production-ready after addressing medium priority items.

⚠️ High Priority Issues

1. Incorrect use of strip() method

File: tests/gpu/onnx/quantization/autotune/test_workflow.py (line ~48)
output_dir = baseline_model_path.replace(".onnx", "")  # Current
# The strip() method is being used incorrectly in some contexts
# ..."test.onnx".strip(".onnx") returns "test" but that's incidental
The strip() method removes individual characters, not substrings. Use removesuffix(".onnx") or replace(".onnx", "", 1) instead:
output_dir = baseline_model_path.removesuffix(".onnx")
🔧 Medium Priority Issues

2. List modification during iteration

File: modelopt/onnx/quantization/autotune/common.py (within add_pattern_schemes)

Modifying filtered_schemes while iterating (via remove()) can be risky. While the code breaks after removal (making it safe), consider using a clearer pattern:
# Current: modifies list while iterating
filtered_schemes.remove(existing_to_remove)

# Safer: build new list
filtered_schemes = [s for s in filtered_schemes if s is not existing_to_remove]
3. Missing docstring documentation

File: modelopt/onnx/quantization/autotune/common.py (Config class)

The default_dq_dtype field is defined but not documented in the class docstring

Docstring indentation is inconsistent (some sections not properly indented)

4. String truncation logic

File: modelopt/onnx/quantization/autotune/common.py (PatternSchemes.__str__)
pattern_str = self.pattern_signature[:40] + ("..." if len(self.pattern_signature) > 40 else "")
The current code may append ... even when not truncated. Fix:
pattern_str = self.pattern_signature[:40]
if len(self.pattern_signature) > 40:
    pattern_str += "..."
💡 Simplification Suggestions

Resource cleanup in tests

File: tests/gpu/onnx/quantization/autotune/test_workflow.py

The finally block only cleans up output_path but misses:

baseline_model_path (temporary file)

output_dir directory

Consider using tempfile.TemporaryDirectory() context manager for cleaner cleanup.

📊 Code Changes Summary

File Changes Purpose
__main__.py +302 CLI entry point with argparse
workflows.py +376 High-level autotuning workflow
common.py +551 PatternSchemes, PatternCache, Config
test_workflow.py +82 GPU integration tests
test_config.py +97 Unit tests for Config
models.py +4 Update test model dimensions

✅ Recommendations

Before merge:

✅ Fix strip() usage → use removesuffix(".onnx")

✅ Add missing default_dq_dtype to Config docstring

✅ Fix docstring indentation in Config class

Nice to have: 4. Consider tempfile.TemporaryDirectory() in tests 5. Consider clarifying the list modification logic in add_pattern_schemes

Overall: This PR is well-implemented and necessary for the QDQ autotuning CLI. The high-level workflow abstraction is clean, and the CLI design is user-friendly with good examples and help text.

cc: @willg-nv @gcunhase

@willg-nv could you take a look at 2,3 and the simplication suggestion?

modelopt-bot · 2026-02-26T07:39:42Z

🔴 Security Issue Identified

File: (line ~32)

Issue: Unauthorized comment bypassing Bandit security check

Per security policy, comments require:

Review and approval by @NVIDIA/modelopt-setup-codeowners
Explicit justification in PR description

Neither condition is met. Please remove the comment and use a safer path, or obtain approval with justification.

Suggested fix:

cc: @NVIDIA/modelopt-setup-codeowners

modelopt-bot · 2026-02-26T07:44:06Z

🔴 Security Issue: Unauthorized `# nosec` Comment

PR: #839
File: modelopt/onnx/quantization/autotune/__main__.py
Line: ~32

Issue

DEFAULT_TIMING_CACHE = "/tmp/trtexec_timing.cache"  # nosec B108

The # nosec B108 comment bypasses Bandit's check for insecure temporary file paths. Per security policy:

# nosec comments require approval by @NVIDIA/modelopt-setup-codeowners
Must include explicit justification in PR description
Neither condition is met in this PR

Recommended Fix

# Option 1: Safer path (recommended)
import tempfile
import os
DEFAULT_TIMING_CACHE = os.path.join(tempfile.gettempdir(), "trtexec_timing.cache")

# Option 2: If hardcoded path required, remove # nosec and document in PR
DEFAULT_TIMING_CACHE = "/tmp/trtexec_timing.cache"
# Then add justification in PR description and obtain approval

Security Checklist

Remove # nosec comment OR obtain approval from @NVIDIA/modelopt-setup-codeowners
Add explicit justification in PR description if keeping bypass

cc: @willg-nv @gcunhase @NVIDIA/modelopt-setup-codeowners

willg-nv · 2026-02-27T07:09:34Z

🔴 Security Issue Identified

File: (line ~32)

Issue: Unauthorized comment bypassing Bandit security check

Per security policy, comments require:

Review and approval by @NVIDIA/modelopt-setup-codeowners

Explicit justification in PR description

Neither condition is met. Please remove the comment and use a safer path, or obtain approval with justification.

Suggested fix:

cc: @NVIDIA/modelopt-setup-codeowners

false alarm, no file path.

willg-nv · 2026-02-27T07:10:08Z

@cjluo-nv all other copilot comments are resolved.

## What does this PR do? This PR implements QDQAutotuner class. This class is used to drive the main Autotuner workflow. The workflow is: 1. uses RegionSearch to build regions 2. generate QDQ ONNX models and evaluate perf 3. save best model This PR is part 2/4 of #703. PR 3.1: #837 PR 3.2 #838 PR 3.3: #839 **Overview:** ? ## Testing  ## Before your PR is "*Ready for review*"  - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: Not in this part. - **Did you add or update any necessary documentation?**: No, document will be updated in part 4. - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: No, change log will be updated when all changes are ready. ## Additional Information   ## Summary by CodeRabbit * **New Features** * Introduced ONNX Q/DQ autotuning framework with automatic region discovery and pattern-based optimization. * Added model profiling and quantization scheme generation capabilities. * Enabled state persistence and quantization model export functionality. * Introduced configuration management for quantization parameters and profiling workflows. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub>  --------- Signed-off-by: Will Guo <[email protected]>

Signed-off-by: Will Guo <[email protected]>

cjluo-nv · 2026-02-27T20:21:25Z

/ok to test ef161e8

Signed-off-by: Will Guo <[email protected]>

cjluo-nv · 2026-03-02T06:32:33Z

/ok to test c8274b8

codecov · 2026-03-02T06:42:49Z

Codecov Report

❌ Patch coverage is 40.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.05%. Comparing base (35e6099) to head (c8274b8).
⚠️ Report is 7 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/onnx/quantization/autotune/common.py	40.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #839      +/-   ##
==========================================
- Coverage   72.15%   72.05%   -0.10%     
==========================================
  Files         210      210              
  Lines       23515    23549      +34     
==========================================
+ Hits        16967    16968       +1     
- Misses       6548     6581      +33

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

willg-nv requested a review from a team as a code owner February 2, 2026 03:04

willg-nv requested a review from vishalpandya1990 February 2, 2026 03:04

willg-nv mentioned this pull request Feb 2, 2026

Integrate Automated QDQ benchmark - part 3.1 #837

Merged

coderabbitai bot reviewed Feb 2, 2026

View reviewed changes

modelopt/onnx/quantization/autotune/__main__.py Show resolved Hide resolved

modelopt/onnx/quantization/autotune/workflows.py Show resolved Hide resolved

willg-nv mentioned this pull request Feb 2, 2026

Integrate Automated QDQ autotuner - part 3.2 #838

Merged

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part3.3 branch 2 times, most recently from 09e136a to e3ad6da Compare February 9, 2026 08:42